First, we load some re-usable functions for our activity. You can inspect, edit, and add to the functions by opening the relevant .R script. But for now, we can just source them.
source("R//functions//functions.R")
Let’s now load the necessary libraries for this activity:
library(sf)
library(ggplot2)
library(tmap)
library(dplyr)
library(scales)
library(leaflet)
library(rnaturalearth)
library(rnaturalearthdata)
library(htmlwidgets)
Now, let’s start by using our file_grabber() function to download some data from the London Data Store.
# Download and unzip our chosen shapefiles from the London Data Store
file_grabber(file_name_dl = "statistical-gis-boundaries-london.zip",
file_name_final = "MSOA_2011_London_gen_MHW.shp",
file_path = "/Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/data/",
url = "https://data.london.gov.uk/download/statistical-gis-boundary-files-london/9ba8c833-6370-4b11-abdc-314aa020d5e0",
compressed = TRUE)
## [1] "File already downloaded."
## [1] "File unzipped."
Now that we have downloaded the data, let’s load it into R using the
sf package. Here will we not focus on functional
programming, so that we can inspect our objects manually and learn about
them.
When handling spatial data, we need to be especially attuned to the coordinate reference system (CRS) of the data. This is because the CRS determines how the data, which are actually in three dimensions, is projected onto a 2D plane. If you just work with one shapefile the CRS isn’t so important (though it will change what any map you render looks like), but the moment you combine multiple spatial data sources, you need to make sure they are in the same CRS or you can accidentally end up with data that is misaligned.
# Read the shapefile
shapefile_name <- "MSOA_2011_London_gen_MHW.shp"
london_msoa_2011 <- st_read(paste0("data/shapefiles/statistical-gis-boundaries-london/ESRI/",shapefile_name))
## Reading layer `MSOA_2011_London_gen_MHW' from data source
## `/Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/data/shapefiles/statistical-gis-boundaries-london/ESRI/MSOA_2011_London_gen_MHW.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 983 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
## Projected CRS: OSGB36 / British National Grid
# Look at the data -- note, it doesn't look like a regular tibble or data frame, there is other info there.
london_msoa_2011
## Simple feature collection with 983 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 503574.2 ymin: 155850.8 xmax: 561956.7 ymax: 200933.6
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
## MSOA11CD MSOA11NM LAD11CD LAD11NM RGN11CD
## 1 E02000001 City of London 001 E09000001 City of London E12000007
## 2 E02000002 Barking and Dagenham 001 E09000002 Barking and Dagenham E12000007
## 3 E02000003 Barking and Dagenham 002 E09000002 Barking and Dagenham E12000007
## 4 E02000004 Barking and Dagenham 003 E09000002 Barking and Dagenham E12000007
## 5 E02000005 Barking and Dagenham 004 E09000002 Barking and Dagenham E12000007
## 6 E02000007 Barking and Dagenham 006 E09000002 Barking and Dagenham E12000007
## 7 E02000008 Barking and Dagenham 007 E09000002 Barking and Dagenham E12000007
## 8 E02000009 Barking and Dagenham 008 E09000002 Barking and Dagenham E12000007
## 9 E02000010 Barking and Dagenham 009 E09000002 Barking and Dagenham E12000007
## 10 E02000011 Barking and Dagenham 010 E09000002 Barking and Dagenham E12000007
## RGN11NM USUALRES HHOLDRES COMESTRES POPDEN HHOLDS AVHHOLDSZ
## 1 London 7375 7187 188 25.5 4385 1.6
## 2 London 6775 6724 51 31.3 2713 2.5
## 3 London 10045 10033 12 46.9 3834 2.6
## 4 London 6182 5937 245 24.8 2318 2.6
## 5 London 8562 8562 0 72.1 3183 2.7
## 6 London 8791 8672 119 50.6 3441 2.5
## 7 London 11569 11564 5 81.5 4591 2.5
## 8 London 8395 8376 19 87.4 3212 2.6
## 9 London 8615 8615 0 76.8 3292 2.6
## 10 London 6187 6086 101 38.8 2289 2.7
## geometry
## 1 MULTIPOLYGON (((531667.6 18...
## 2 MULTIPOLYGON (((548881.6 19...
## 3 MULTIPOLYGON (((549102.4 18...
## 4 MULTIPOLYGON (((551550 1873...
## 5 MULTIPOLYGON (((549099.6 18...
## 6 MULTIPOLYGON (((549819.9 18...
## 7 MULTIPOLYGON (((548171.4 18...
## 8 MULTIPOLYGON (((546855 1863...
## 9 MULTIPOLYGON (((549618.8 18...
## 10 MULTIPOLYGON (((550244.1 18...
# It turns out that some (just 4) of the polygons are not valid (i.e., they don't join up properly).
sum(!st_is_valid(london_msoa_2011)) # check if any issues? 4 exist
## [1] 4
london_msoa_2011 <- st_make_valid(london_msoa_2011) # correct them
sum(!st_is_valid(london_msoa_2011)) # check again?
## [1] 0
# Make a very boring map - behold, this is London!
plot(london_msoa_2011$geometry)
# Set a global variable that will be our project CRS, based on the London shapefile crs
proj_crs <- st_crs(london_msoa_2011)
proj_crs$input
## [1] "OSGB36 / British National Grid"
Let’s start by making a very simple map of population density by MSOA. For this, a choropleth map is a good choice as the variable. That is because the variable we are going to map is scaled by area and thus invariant to the visual area of the unit we are mapping. We’ll do this two ways, first using ggplot2, and then using tmap.
# Using ggplot2 -- note, ggplot2 maps are by default quite visually noisy, so we remove a lot of that using theme_void()
choropleth_map_gg <-
ggplot() +
geom_sf(data = london_msoa_2011, aes(fill = POPDEN), colour = "gray30") + # Choropleth map with POPDEN variable
ggtitle("Population Density in London MSOAs") +
scale_fill_viridis_c(name = "Population Density", labels = scales::comma) + # Choose a viridis colour scale
theme_void() + # Use theme_void() for minimalist style
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) + # Reduce margins a bit
labs(fill = "Population Density") # Customize legend title
choropleth_map_gg
# We can easily save ggplot map output:
ggsave("output/choropleth_map_gg.pdf", choropleth_map_gg, width = 9, height = 6)
Now let’s use tmap:
# The same map using tmap, which defaults to a much cleaner presentation, so we don't have to do too much (just switch off the frame):
choropleth_map_tm <-
tm_shape(london_msoa_2011) +
tm_fill(col = "POPDEN", title = "Population Density", style = "cont", palette = "viridis") +
tm_borders(col = "gray30") +
tm_layout(title = "Population Density in London MSOAs", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "cont"`, use fill.scale =
## `tm_scale_continuous()`.
## ℹ Migrate the argument(s) 'palette' (rename to 'values') to
## 'tm_scale_continuous(<HERE>)'
## [v3->v4] `tm_polygons()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
choropleth_map_tm
# We can also easily save ggplot map output:
tmap_save(choropleth_map_tm, "output/choropleth_map_tm.pdf", width = 9, height = 6)
## Map saved to /Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/output/choropleth_map_tm.pdf
## Size: 9 by 6 inches
Choropleth maps are good for area-invariant metrics, but what if we want to map something like absolute population (not density)? Let’s quickly see what happens…
tm_shape(london_msoa_2011) +
tm_fill(col = "USUALRES", title = "Usual Residents", style = "cont", palette = "viridis") +
tm_borders(col = "gray30") +
tm_layout(title = "Usual Residents in London MSOAs", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: instead of `style = "cont"`, use fill.scale =
## `tm_scale_continuous()`.
## ℹ Migrate the argument(s) 'palette' (rename to 'values') to
## 'tm_scale_continuous(<HERE>)'
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
This becomes visually confusing – the colors (which give us an absolute unscaled metric) don’t make sense, given what we know (or at least expect) about London’s population distribution. What’s going on? Well, the magnitude of the variable is not invariant to the area of the unit we are mapping. Central parts of London will have very high population counts, but also very small areas.
An alternative approach would be scaled points. This is a good choice when you want to show the absolute magnitude of a variable, but you don’t want the visual area of the unit to distort the visual representation.
# First, pick out the centroids of the MSOAs, creating a new object
centroids <- st_centroid(london_msoa_2011)
## Warning: st_centroid assumes attributes are constant over geometries
# Second, plot both the polygon (for reference) and the points, scaled to USUALRES. Here, with ggplot:
scaled_points_map_gg <-
ggplot() +
geom_sf(data = london_msoa_2011, fill = "transparent", color = "gray30", alpha = 0.7) + # Outline of MSOAs for reference
geom_sf(data = centroids, aes(size = USUALRES), color = "blue", alpha = 0.5) + # Points at centroids, size scaled to USUALRES by aes()
scale_size_continuous(name = "Usual Residents", range = c(0.1, 3), breaks = scales::breaks_pretty(n = 5)) + # Use range to set lower and upper limits of point size
ggtitle("Population by MSOA in London") +
theme_void() + # Use theme_void() for minimalist style
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) # Reduce margins a bit
scaled_points_map_gg
ggsave("output/scaled_points_map_gg.pdf", scaled_points_map_gg, width = 9, height = 6)
scaled_points_map_tm <-
tm_shape(london_msoa_2011) + # Layer 1 is the msoa polygon
tm_borders(col = "gray30") + # Outline of MSOAs
tm_shape(centroids) + # Layer 2 is the centroids. Note how this works differently to ggplot. We go tm_shap() for the shapefile, then add grammar below.
tm_bubbles(size = "USUALRES", col = "blue", style = "pretty", alpha = 0.5, title.size = "Usual Residents", scale = 0.6) + # Points at centroids, size scaled to USUALRES
tm_layout(title = "Population by MSOA in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE) # Set map title
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_bubbles()`: instead of `style = "pretty"`, use fill.scale =
## `tm_scale_intervals()`.
## ℹ Migrate the argument(s) 'style' to 'tm_scale_intervals(<HERE>)'
## [v3->v4] `tm_bubbles()`: use `fill_alpha` instead of `alpha`.
## [v3->v4] `tm_tm_bubbles()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
## a list: 'size.scale = list(<scale1>, <scale2>, ...)'
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
scaled_points_map_tm
tmap_save(scaled_points_map_tm, "output/scaled_points_map_tm.pdf", width = 9, height = 6)
## Map saved to /Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/output/scaled_points_map_tm.pdf
## Size: 9 by 6 inches
# Download the data from the London Datastore. We can re-use our function:
file_grabber(file_name_dl = "Rapid_charging_points.gpkg", # this is a .gpkg geo-package, not a .shp shapefile
file_path = "data/",
url = "https://data.london.gov.uk/download/electric_vehicle_charging_site/8ef9c743-c01d-4329-8239-8f858ff4de53",
compressed = FALSE)
## [1] "File already downloaded."
## [1] "File already unzipped or no need for unzipping."
# Ingest rapid charging points data
rapid_chargers <- st_read("data/Rapid_charging_points.gpkg")
## Reading layer `Rapid_charging_points' from data source
## `/Users/shujaali/gv330_2425/Seminars/seminar5_spatialData/data/Rapid_charging_points.gpkg'
## using driver `GPKG'
## Simple feature collection with 156 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 505012.1 ymin: 157855.5 xmax: 555472.3 ymax: 199091.7
## Projected CRS: unnamed
rapid_chargers
## Simple feature collection with 156 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 505012.1 ymin: 157855.5 xmax: 555472.3 ymax: 199091.7
## Projected CRS: unnamed
## First 10 features:
## borough easting latitude longtitude
## 1 Richmond 520008.8994827288 51.464253 -0.273817
## 2 Newham 543605.6261034013 51.523421 0.068574
## 3 Brent 524592.31514180696 51.546586 -0.204599
## 4 Redbridge 540514.75320855307 51.591184 0.027028
## 5 Greenwich 540531.48913699761 51.476538 0.022182
## 6 Haringey 530017.97962389176 51.579178 -0.125007
## 7 Hammersmith & Fulham 522999.1943170313 51.519975 -0.228609
## 8 Richmond 520271.88623687194 51.464093 -0.270037
## 9 Westminster 526373.98381476535 51.511661 -0.180296
## 10 Westminster 526115.96909061982 51.523033 -0.183562
## northing numberrcpoints objectid siteid
## 1 175332.84072642814 1 1 2
## 2 182528.53009116615 2 2 12
## 3 184604.23447573709 1 3 63
## 4 189983.24549439573 1 35 558
## 5 177225.12555734196 1 4 74
## 6 188367.05681860622 2 5 78
## 7 181604.36517970893 3 6 87
## 8 175321.24338926666 1 7 129
## 9 180762.05435913772 1 8 169
## 10 182021.04071718361 1 9 174
## sitename taxipublicuses
## 1 Upper Richmond Road west of Coval Rd Public use
## 2 Claps Gate Lane Retail park Public use
## 3 1 Christchurch Avenue, Kilburn Station, NW6 7QP Taxi
## 4 South Woodford Station Car Park Taxi
## 5 Old Dover Road west of Dornberg Close - car park Public use
## 6 Crouch Hall Rd opp Bryanstone Road Public use
## 7 Scrubbs Lane - Wood Lane Car Park Public use
## 8 A205 Upper Richmond Road West east of Carlton Road. Public use
## 9 Lancaster Gate Taxi
## 10 Warwick Avenue by Clifton Villas Taxi
## url
## 1 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 2 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 3 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 4 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 5 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 6 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 7 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 8 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 9 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## 10 https://tfl.gov.uk/modes/driving/electric-vehicles-and-rapid-charging
## runtime geom
## 1 08/29/2019 POINT (520008.9 175332.8)
## 2 08/29/2019 POINT (543605.6 182528.5)
## 3 08/29/2019 POINT (524592.3 184604.2)
## 4 08/29/2019 POINT (540514.7 189983.2)
## 5 08/29/2019 POINT (540531.5 177225.1)
## 6 08/29/2019 POINT (530017.9 188367)
## 7 08/29/2019 POINT (522999.2 181604.3)
## 8 08/29/2019 POINT (520271.9 175321.2)
## 9 08/29/2019 POINT (526373.9 180762)
## 10 08/29/2019 POINT (526115.9 182021)
Now let’s transform and manipulate our data:
# Set the CRS of the main polygon shapefile (we do not re-project here)
rapid_chargers <- rapid_chargers |>
st_set_crs(proj_crs)
## Warning: st_crs<- : replacing crs does not reproject data; use st_transform for
## that
# For interest, what if we want to find out which MSOA each charger is in?
# Create a new object that conducts a spatial join the rapid chargers data to the shapefile by MSOA name. We use join = st_intersects.
rapid_chargers_MSOA <- st_join(rapid_chargers, london_msoa_2011, join = st_intersects)
# Check that we haven't lost or duped rows -- don't skip this step!
nrow(rapid_chargers_MSOA) == nrow(rapid_chargers)
## [1] TRUE
# Note: What if we joined the other way? This is the Bad Place!
bad_join <- st_join(london_msoa_2011, rapid_chargers, join = st_intersects)
nrow(bad_join) == nrow(london_msoa_2011)
## [1] FALSE
# What if instead we wanted to count the number of chargers in each MSOA? And then create a binary indicator for presence of at least one?
london_msoa_2011_counts <- london_msoa_2011 |>
transform(charger_count = lengths(st_intersects(london_msoa_2011, rapid_chargers))) |>
transform(has_charger = ifelse(charger_count>0,"Yes","No"))
# Check the join:
nrow(london_msoa_2011_counts) == nrow(london_msoa_2011)
## [1] TRUE
# Plot charging stations and London boundaries together using ggplot2, diff colours by number of charge points
rapid_chargers_gg <-
ggplot() +
geom_sf(data = london_msoa_2011, fill = "transparent", color = "gray30") + # Outline of MSOAs for reference
geom_sf(data = rapid_chargers, aes(color = numberrcpoints), size = 3, alpha = .8) +
ggtitle("Rapid Charging Stations in London") +
theme_void() + # Use theme_void() for minimalist style
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) + # Reduce margins a bit
labs(color = "Number of Charge Points") # Relabel the legend
rapid_chargers_gg
# Plot charging stations and London boundaries together using tmap, diff colours by number of charge points
rapid_chargers_tm <-
tm_shape(london_msoa_2011) +
tm_borders(col = "gray30") + # Outline of MSOAs
tm_shape(rapid_chargers) +
tm_dots(col = "numberrcpoints", title = "Number of Charge Points", scale = 4, alpha = .8) + # Adjust dot size
tm_layout(title = "Rapid Charging Stations in London", frame = FALSE) # Set map title
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.
## [v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
## a list: 'size.scale = list(<scale1>, <scale2>, ...)'
## [tm_dots()] Argument `title` unknown.
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
rapid_chargers_tm
# Choropleth plot where MSOAs with at least 1 charger are highlighted:
rapid_chargers_choro_gg <-
ggplot() +
geom_sf(data = london_msoa_2011_counts, aes(fill = has_charger), color = "gray30") +
theme_void() + # Use theme_void() for minimalist style
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) + # Reduce margins a bit
labs(fill = "Charger Present?") + # Customize legend title
ggtitle("Rapid Charging Coverage in London")
rapid_chargers_choro_gg
# And again, in tmap:
rapid_chargers_choro_tm <-
tm_shape(london_msoa_2011_counts) +
tm_borders(col = "gray30") +
tm_fill(col = "has_charger", title = "Charger Present?") +
tm_layout(title = "Rapid Charging Coverage in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE) # Set map title
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: migrate the argument(s) related to the legend of the
## visual variable `fill` namely 'title' to 'fill.legend = tm_legend(<HERE>)'[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
rapid_chargers_choro_tm
Let’s try a more challenging case: a dataset of points that is not yet a spatial object (just a .csv).
file_grabber(file_name_dl = "all_schools_xy_2016.csv",
file_path = "data/",
url = "https://data.london.gov.uk/download/london-schools-atlas/57046151-39a0-45d9-8dc0-27ea7fd02de8",
compressed = FALSE)
## [1] "File already downloaded."
## [1] "File already unzipped or no need for unzipping."
# Load schools data
schools <- read.csv("data/all_schools_xy_2016.csv")
We have to be a little careful here, as the chosen project crs (declared above) happens to convert our x and y into metres, rather than using lat/long. We have to take two steps here – first, we identify coordinates in our data frame and apply a CRS (‘4326’), making this an sf object. We then project it into the projected CRS. We didn’t have to do this two-step procedure with rapid chargers as that was already a spatial object.
schools_sf <- schools |>
st_as_sf(coords = c("x", "y"), crs = 4326) |>
st_transform(proj_crs)
schools_sf
## Simple feature collection with 3889 features and 22 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 499101.9 ymin: 151893 xmax: 564318.2 ymax: 205041.7
## Projected CRS: OSGB36 / British National Grid
## First 10 features:
## OBJECTID URN SCHOOL_NAM
## 1 1 135155 Ayesha Siddiqa Girls School
## 2 2 140492 Beis Medrash Elyon
## 3 3 141411 Big Creative Independent School
## 4 4 142336 Wetherby Senior School
## 5 5 100042 St Mary's Kilburn Church of England Primary School
## 6 6 100224 De Beauvoir Primary School
## 7 7 100235 Queensbridge Primary School
## 8 8 100239 Princess May Primary School
## 9 9 100263 Holy Trinity Church of England Primary School
## 10 10 100483 Marlborough Primary School
## TYPE PHASE ADDRESS TOWN
## 1 Other Independent School Not applicable 165-169 The Broadway Southall
## 2 Other Independent School Not applicable 233 West Hendon Broadway London
## 3 Other Independent School Not applicable Silver Birch House Walthamstow
## 4 Other Independent School Not applicable 100 Marylebone Lane London
## 5 Voluntary Aided School Primary Quex Road London
## 6 Community School Primary 80 Tottenham Road London
## 7 Community School Primary Queensbridge Road London
## 8 Community School Primary Princess May Road London
## 9 Voluntary Aided School Primary Richmond Road London
## 10 Community School Primary Draycott Avenue London
## POSTCODE STATUS GENDER EASTING NORTHING WARD_NAME
## 1 UB1 1LR Open Girls 521263 180470 Southall Broadway
## 2 NW9 7DG Open Boys 521939 188148 West Hendon
## 3 E17 5SD Open Mixed 535764 190188 Higham Hill
## 4 W1U 2QB Open Boys 528432 181474 Marylebone High Street
## 5 NW6 4PG Open Mixed 525453 183984 Kilburn
## 6 N1 4BS Open Mixed 533252 184710 De Beauvoir
## 7 E8 3ND Open Mixed 533902 184050 Queensbridge
## 8 N16 8AJ Open Mixed 533512 185478 Stoke Newington Central
## 9 E8 3DY Open Mixed 533659 184613 Queensbridge
## 10 SW3 3AP Open Mixed 527415 178681 Hans Town
## LSOA_NAME LA_NAME
## 1 Ealing 026C Ealing
## 2 Barnet 036F Barnet
## 3 Waltham Forest 014C Waltham Forest
## 4 Westminster 011B Westminster
## 5 Camden 020C Camden
## 6 Hackney 021E Hackney
## 7 Hackney 020E Hackney
## 8 Hackney 014F Hackney
## 9 Hackney 021I Hackney
## 10 Kensington and Chelsea 014C Kensington and Chelsea
## WEBLINK AGE map_icon NEW_URN OLD_URN
## 1 19-Nov NA NA
## 2 16-Nov NA NA
## 3 15 - 16 NA NA
## 4 16-Nov NA NA
## 5 http://www.stmarykilburn.camden.sch.uk/ 11-Mar VOLUNTARY NA NA
## 6 www.debeauvoir.hackney.sch.uk/ 11-Mar STATE-FUNDED NA NA
## 7 www.queensbridge.hackney.sch.uk/ 11-Mar STATE-FUNDED NA NA
## 8 www.princessmay.hackney.sch.uk/ 11-Mar STATE-FUNDED NA NA
## 9 www.holytrinity.hackney.sch.uk/ 11-Mar VOLUNTARY NA NA
## 10 http://www.marlborough.rbkc.sch.uk 11-Mar STATE-FUNDED NA NA
## map_icon_l Primary geometry
## 1 2 0 POINT (512629.7 179975.9)
## 2 2 0 POINT (521936.5 188146)
## 3 2 0 POINT (535682.4 190165)
## 4 2 0 POINT (528429.4 181474.4)
## 5 2 1 POINT (525386.4 183935.4)
## 6 2 1 POINT (533441.4 184685.8)
## 7 2 1 POINT (533822.3 184028.1)
## 8 2 1 POINT (533377.3 185518.7)
## 9 2 1 POINT (533690.4 184403.3)
## 10 2 1 POINT (527412.5 178677.9)
# Let's do a quick bit of data munging with this dataset:
# First, let's create a new variable that focuses in on just the four most common types of schools
schools_sf <- schools_sf |>
transform(TYPE_REDUCED = ifelse(TYPE %in% c('Academy Converter', 'Community School', 'Other Independent School', 'Voluntary Aided School'), TYPE, NA))
# Quick check
table(schools_sf$TYPE_REDUCED)
##
## Academy Converter Community School Other Independent School
## 417 1248 897
## Voluntary Aided School
## 589
# Second, let's fix an absolutely hilarious excel error in the data.
# Excel has taken the variable AGE (which gives the school age range) and interpreted it as a date.
# I.e., cases where the school is 11-19 years, are reported as 19-Nov in the data! Fortunately, we can reverse engineer the right data...
# First, find those cases where this has gone wrong by doing a string search for any of the months (month.abb is build into base R), and create a dummy variable:
# extract all second parts of the age string:
parts <- strsplit(schools_sf$AGE,"-")
schools_sf <- schools_sf |>
transform(AGE_MAX = as.numeric(sapply(schools_sf$AGE, function(date_string) { # Extract the max age as the first item before the -
date_parts <- strsplit(date_string, "-")
return(date_parts[[1]][1])
}))) |>
transform(AGE_MIN = sapply(schools_sf$AGE, function(date_string) { # Extract the min age (which is often incorrectly reported as a month) as the item after the -
date_parts <- strsplit(date_string, "-")
return(date_parts[[1]][2])
})) |>
transform(AGE_CORRUPTED = ifelse(AGE_MIN %in% month.abb, 1, 0))
# Extract numeric values that are still present in AGE_MIN (there are some that excel didn't corrupt)
numeric_values <- schools_sf$AGE_MIN[!schools_sf$AGE_MIN %in% month.abb]
# Define a mapping between month names and their corresponding true numeric values
month_mapping <- c("Jan" = 1, "Feb" = 2, "Mar" = 3, "Apr" = 4, "May" = 5, "Jun" = 6,
"Jul" = 7, "Aug" = 8, "Sep" = 9, "Oct" = 10, "Nov" = 11, "Dec" = 12)
# Add the numeric values we extracted to the mapping
month_mapping <- c(month_mapping, setNames(numeric_values, as.character(numeric_values)))
# Use the mapping to convert month names and numeric values to final numeric values
schools_sf$AGE_MIN <- month_mapping[as.character(schools_sf$AGE_MIN)]
# Create age range variable, and remove excess white space:
schools_sf <- schools_sf |>
transform(AGE_RANGE = ifelse(AGE_CORRUPTED == 1, paste0(AGE_MIN, "-", AGE_MAX), AGE)) |>
transform(AGE_RANGE = gsub(" ", "", AGE_RANGE, fixed = TRUE))
# Quick check
table(schools_sf$AGE_RANGE)
##
## 0-11 0-13 0-6 0-7 1-11 1-13 1-14 1-19 1-5 1-6 1-7 10-16 10-17
## 2 2 1 1 1 2 2 2 1 1 1 10 2
## 10-18 10-19 11-13 11-16 11-17 11-18 11-19 11-20 11-21 12-16 12-18 12-19 13-16
## 10 14 1 135 6 367 102 1 1 3 3 1 6
## 13-17 13-18 13-19 13-20 13-21 14-16 14-18 14-19 14-20 14-21 14-25 14-35 15-16
## 2 7 3 1 1 11 4 22 1 3 1 1 1
## 15-18 15-19 15-24 2-11 2-12 2-13 2-14 2-16 2-17 2-18 2-19 2-5 2-6
## 1 3 1 94 4 36 10 10 2 8 28 1 9
## 2-7 2-8 3-10 3-11 3-12 3-13 3-14 3-15 3-16 3-17 3-18 3-19 3-20
## 19 2 3 1359 12 28 10 2 52 6 78 62 2
## 3-25 3-5 3-6 3-7 3-8 3-9 4-11 4-12 4-13 4-16 4-18 4-19 4-7
## 2 3 4 113 5 2 417 4 42 34 74 36 9
## 4-8 4-9 5-10 5-11 5-12 5-13 5-14 5-16 5-18 5-19 5-21 5-7 6-13
## 2 1 2 133 6 4 8 26 12 18 2 56 4
## 6-14 6-15 6-16 7-11 7-13 7-14 7-16 7-17 7-18 7-19 8-13 8-14 8-16
## 2 2 6 166 12 10 14 2 24 18 4 2 6
## 8-18 8-19 9-16 9-18 9-19
## 6 10 2 4 2
Let’s quickly make a static map of the schools. We can improve our previous mapping by adding a bit of jazz to our presentation…
# Plot schools data using ggplot2
schools_plot_gg <-
ggplot() +
geom_sf(data = london_msoa_2011, fill = "transparent", color = "gray30") + # Set fill to transparent and color to black
geom_sf(data = schools_sf, shape = 19, aes(color = TYPE_REDUCED), alpha = .5, size = 1) +
ggtitle("Schools Locations in London") +
coord_sf() +
theme_void() +
theme(plot.margin = unit(c(0.2, 0.2, 0.2, 0.2), "cm")) +
labs(color = "School Type") # Relabel the legend
schools_plot_gg
# The same map using tmap
schools_plot_tm <-
tm_shape(london_msoa_2011) +
tm_borders(col = "gray30") +
tm_shape(schools_sf) +
tm_dots(shape = 19, col = "TYPE_REDUCED", title = "School Type", alpha = 0.5, scale = 2.5) +
tm_layout(title = "Schools Locations in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.
## [v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
## a list: 'size.scale = list(<scale1>, <scale2>, ...)'
## [tm_dots()] Argument `title` unknown.
## [v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
schools_plot_tm
# We can always jazz up the above maps a little, for example, using tmap:
schools_plot_jazz_tm <-
tm_shape(london_msoa_2011) +
tm_fill(alpha = 0.1, fill = "gray60") +
tm_borders(col= 'white') +
tm_shape(schools_sf) +
tm_dots(shape = 19, col = "TYPE_REDUCED", title = "School Type", alpha = 0.5, scale = 2.5) +
tm_layout(title = "Schools Locations in London",
title.color = 'white',
legend.text.color = 'white',
legend.title.color = 'white',
inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE, bg.color = 'gray10')
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_polygons()`: use `fill_alpha` instead of `alpha`.[v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.[v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
## a list: 'size.scale = list(<scale1>, <scale2>, ...)'[tm_dots()] Argument `title` unknown.[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
schools_plot_jazz_tm
Finally, let’s make a dynamic/interactive map. This is going to allow us to actually move around the map, zoom in, zoom out, click on individual data points and see their data, etc.
# Let's create an interactive map with a basemap, again using tmap:
# First, set the tmap_mode to viewing:
library(tmap)
library(sf)
tmap_mode('view')
## ℹ tmap mode set to "view".
# Second, create the interactive map:
schools_plot_interactive <-
tm_shape(london_msoa_2011) +
tm_borders() +
tm_shape(schools_sf) +
tm_dots(shape = 19, col = "TYPE_REDUCED", title = "School Type", alpha = 0.5, scale = 2.5,
id = "SCHOOL_NAM", # the scroll-over name
popup.vars = c("AGE_RANGE", "GENDER", "STATUS", "WARD_NAME")) + # variables when you click -- note the hilarious excel error in AGE
tm_layout(title = "Schools Locations in London", inner.margins = c(0.1, 0.1, 0.1, 0.1), frame = FALSE) +
tm_basemap(c('OpenStreetMap'))
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_dots()`: use `fill_alpha` instead of `alpha`.[v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
## visual variable `size` namely 'scale' (rename to 'values.scale') to size.scale
## = tm_scale_continuous(<HERE>).
## ℹ For small multiples, specify a 'tm_scale_' for each multiple, and put them in
## a list: 'size.scale = list(<scale1>, <scale2>, ...)'[tm_dots()] Argument `title` unknown.[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
schools_plot_interactive
## Registered S3 method overwritten by 'jsonify':
## method from
## print.json jsonlite
# We can save it as an .html file:
# You can also take a static image using mapshot()
tmap_save(schools_plot_interactive, "output/schools_plot_dynamic.html")
## Interactive map saved to output/schools_plot_dynamic.html
# Revert to plotting mode
tmap_mode('plot')
## ℹ tmap mode set to "plot".